$\ell_1$ Regularized Gradient Temporal-Difference Learning
نویسندگان
چکیده
In this paper, we study the Temporal Difference (TD) learning with linear value function approximation. It is well known that most TD learning algorithms are unstable with linear function approximation and off-policy learning. Recent development of Gradient TD (GTD) algorithms has addressed this problem successfully. However, the success of GTD algorithms requires a set of well chosen features, which are not always available. When the number of features is huge, the GTD algorithms might face the problem of overfitting and being computationally expensive. To cope with this difficulty, regularization techniques, in particular `1 regularization, have attracted significant attentions in developing TD learning algorithms. The present work combines the GTD algorithms with `1 regularization. We propose a family of `1 regularized GTD algorithms, which employ the well known soft thresholding operator. We investigate convergence properties of the proposed algorithms, and depict their performance with several numerical experiments.
منابع مشابه
Mixing-Time Regularized Policy Gradient
Policy gradient reinforcement learning (PGRL) has been receiving substantial attention as a mean for seeking stochastic policies that maximize cumulative reward. However, the learning speed of PGRL is known to decrease substantially when PGRL explores the policies that give the Markov chains having long mixing time. We study a new approach of regularizing how the PGRL explores the policies by t...
متن کاملRegularized Off-Policy TD-Learning
We present a novel l1 regularized off-policy convergent TD-learning method (termed RO-TD), which is able to learn sparse representations of value functions with low computational complexity. The algorithmic framework underlying ROTD integrates two key ideas: off-policy convergent gradient TD methods, such as TDC, and a convex-concave saddle-point formulation of non-smooth convex optimization, w...
متن کاملRegularized Least Squares Temporal Difference Learning with Nested ℓ2 and ℓ1 Penalization
The construction of a suitable set of features to approximate value functions is a central problem in reinforcement learning (RL). A popular approach to this problem is to use high-dimensional feature spaces together with least-squares temporal difference learning (LSTD). Although this combination allows for very accurate approximations, it often exhibits poor prediction performance because of ...
متن کاملL1 Regularized Linear Temporal Difference Learning
Several recent efforts in the field of reinforcement learning have focused attention on the importance of regularization, but the techniques for incorporating regularization into reinforcement learning algorithms, and the effects of these changes upon the convergence of these algorithms, are ongoing areas of research. In particular, little has been written about the use of regularization in onl...
متن کاملRegularized Policy Iteration
In this paper we consider approximate policy-iteration-based reinforcement learning algorithms. In order to implement a flexible function approximation scheme we propose the use of non-parametric methods with regularization, providing a convenient way to control the complexity of the function approximator. We propose two novel regularized policy iteration algorithms by addingL-regularization to...
متن کامل